setting up parameters of the analysis

=========================================================

Species to select from: “Homo sapiens”, “Mus musculus”, “strain ATCC 204508”, “strain K12” Paste value in the code chunk below

SPECIES_NAME = "Homo sapiens"

all Uniprot if reviewed == 1, only Swissprot data if reviewed == 2, TrEMBL data if reviewed == 3 only reviewed = 2 is relevant for this analysis Paste value in the code chunk below

reviewed = 2

To distinguish between isoforms or not (to use only generic UniprotACs): TRUE / FALSE? only isoforms = FALSE is relevant for this analysis Paste value in the code chunk below

isoforms = FALSE

Please specify the date for which you want to perform the analysis (if not today) “Logic table” from protein_properties script is needed for this analysis. Filename example:
“proteome_vs_interactome_protein_properties_f_Homo sapiens_reviewed_2_isoforms_FALSE_2016-12-01.txt”

date = Sys.Date()
date = as.Date("2016-12-01")

the date of analysis, how old is Uniprot protein list: 2016-12-01

Species ID lookup

=========================================================

looking up species ID in the Uniprot readme file (always reads from url)

source("SPECIES_NAME_TO_ID.R")
r = reviewed
i = isoforms
n = SPECIES_NAME
SPECIES_IDs = SPECIES_NAME_TO_ID(n)
## [1] "UP000005640 9606    HUMAN     21032   71899   93381  Homo sapiens (Human)"
##   Proteome_ID SPECIES_ID
## 1 UP000005640       9606
SPECIES_ID = SPECIES_IDs$SPECIES_ID

querying for interations detected by AP-MS or two hybrid using PSICQUIC

=========================================================

Detection method explanation:

to change: type in exact database names (the list below - all IMEx -is default for the function)

databases <- c("IntAct", "MINT", "bhf-ucl", "MPIDB", "MatrixDB", 
               "HPIDb","I2D-IMEx","InnateDB-IMEx", "MolCon", "UniProt", "MBInfo")

source("query_PSICQUIC_for_interactions.R")
twohybrids_all_interactions = query_PSICQUIC_for_interactions(SPECIES_ID = SPECIES_ID, 
                                                   SPECIES_NAME = SPECIES_NAME, 
                                                   databases = databases, date,
                                                   detmethod = "transcriptional complementation assay")
## [1] "loaded from file"
## [1] interactions for Homo sapiens, detmethod(transcriptional complementation assay), pmethod(NA): 
## [1] total number: 135332
## [1] there is no interactions in the databases: 
## [1] MPIDB    MatrixDB MBInfo  
## [1] the number of interactions per database 
##                                 database N of interactions
## 1                                                       53
## 2                 psi-mi:MI:0469(IntAct)            123945
## 3                   psi-mi:MI:0471(MINT)              9834
## 4                psi-mi:MI:0486(UniProt)               771
## 5                  psi-mi:MI:0903(mpidb)                 3
## 6                 psi-mi:MI:1222(mbinfo)                59
## 7                    psi-mi:MI:1262(I2D)                39
## 8  psi-mi:MI:1263(Molecular Connections)                41
## 9                psi-mi:MI:1332(bhf-ucl)                83
## 10                 psi-mi:MI:1335(HPIDb)               504
complementation_all_interactions = query_PSICQUIC_for_interactions(SPECIES_ID = SPECIES_ID, 
                                                              SPECIES_NAME = SPECIES_NAME, 
                                                              databases = databases, date,
                                                              detmethod = "protein complementation assay")
## [1] "loaded from file"
## [1] interactions for Homo sapiens, detmethod(protein complementation assay), pmethod(NA): 
## [1] total number: 139285
## [1] there is no interactions in the databases: 
## [1] MPIDB    MatrixDB MBInfo  
## [1] the number of interactions per database 
##                                 database N of interactions
## 1                                                       54
## 2                 psi-mi:MI:0469(IntAct)            126515
## 3                   psi-mi:MI:0471(MINT)             10329
## 4                psi-mi:MI:0486(UniProt)               877
## 5                  psi-mi:MI:0903(mpidb)                 3
## 6               psi-mi:MI:0974(InnateDB)                21
## 7                 psi-mi:MI:1222(mbinfo)                66
## 8                    psi-mi:MI:1262(I2D)                87
## 9  psi-mi:MI:1263(Molecular Connections)                50
## 10               psi-mi:MI:1332(bhf-ucl)               722
## 11                 psi-mi:MI:1335(HPIDb)               561
ap_ms_all_interactions = query_PSICQUIC_for_interactions(SPECIES_ID = SPECIES_ID, 
                                                         SPECIES_NAME = SPECIES_NAME, 
                                                         databases = databases, date,
                                                         detmethod = "affinity chromatography technology",
                                                         pmethod = "partial identification of protein sequence")
## [1] "loaded from file"
## [1] interactions for Homo sapiens, detmethod(affinity chromatography technology), pmethod(partial identification of protein sequence): 
## [1] total number: 88410
## [1] there is no interactions in the databases: 
## [1] MPIDB    MatrixDB MBInfo  
## [1] the number of interactions per database 
##                                database N of interactions
## 1                                                      92
## 2                psi-mi:MI:0469(IntAct)             85442
## 3                  psi-mi:MI:0471(MINT)               865
## 4               psi-mi:MI:0486(UniProt)              1241
## 5              psi-mi:MI:0974(InnateDB)                 2
## 6                   psi-mi:MI:1262(I2D)                72
## 7 psi-mi:MI:1263(Molecular Connections)                79
## 8               psi-mi:MI:1332(bhf-ucl)                47
## 9                 psi-mi:MI:1335(HPIDb)               570

Cleaning and transforming data

=========================================================

extracting interactor ID-s from interactions (MI-TAB 2.5)

source("interactions_to_interactors.R")
twohybrids_all_interactors = interactions_to_interactors(twohybrids_all_interactions)
complementation_all_interactors = interactions_to_interactors(complementation_all_interactions)
ap_ms_all_interactors = interactions_to_interactors(ap_ms_all_interactions)

filtering interactors for uniprotkb only indentifiers filtering for SPECIES_ID only proteins

source("uniprotkb_and_SPECIES_ID_interactor_selector.R")
twohybrids_all_interactors_SPECIES_ID_only = uniprotkb_and_SPECIES_ID_interactor_selector(twohybrids_all_interactors, SPECIES_ID)
##                           all interactors 
##                                   "18408" 
## interactors with the UniprotKB identifier 
##                                   "17969" 
##     interactors with the other identifier 
##                                     "439" 
##                                SPECIES_ID 
##                                    "9606" 
##                    SPECIES_ID interactors 
##                                   "13933" 
##            interactors from other species 
##                                    "4060"
complementation_all_interactors_SPECIES_ID_only = uniprotkb_and_SPECIES_ID_interactor_selector(complementation_all_interactors, SPECIES_ID)
##                           all interactors 
##                                   "18985" 
## interactors with the UniprotKB identifier 
##                                   "18479" 
##     interactors with the other identifier 
##                                     "506" 
##                                SPECIES_ID 
##                                    "9606" 
##                    SPECIES_ID interactors 
##                                   "14319" 
##            interactors from other species 
##                                    "4185"
ap_ms_all_interactors_SPECIES_ID_only = uniprotkb_and_SPECIES_ID_interactor_selector(ap_ms_all_interactors, SPECIES_ID)
##                           all interactors 
##                                   "13375" 
## interactors with the UniprotKB identifier 
##                                   "12456" 
##     interactors with the other identifier 
##                                     "919" 
##                                SPECIES_ID 
##                                    "9606" 
##                    SPECIES_ID interactors 
##                                   "10766" 
##            interactors from other species 
##                                    "1732"

Removing all isoform IDs (XXXXXX-X+ => XXXXXX) from IDs

  source("isoform_id_all_remover.R")
  twohybrids_all_interactors_SPECIES_ID_only$interactor_IDs = isoform_id_all_remover(twohybrids_all_interactors_SPECIES_ID_only$interactor_IDs)
  complementation_all_interactors_SPECIES_ID_only$interactor_IDs = isoform_id_all_remover(complementation_all_interactors_SPECIES_ID_only$interactor_IDs)
  ap_ms_all_interactors_SPECIES_ID_only$interactor_IDs = isoform_id_all_remover(ap_ms_all_interactors_SPECIES_ID_only$interactor_IDs)

preparing interaction detection method data for logic table: selecting unique proteins and adding the column of ones

unique_twohybrids_interactors_SPECIES_ID_only = unique(cbind(twohybrids_all_interactors_SPECIES_ID_only[c("interactor_IDs")], 1))
colnames(unique_twohybrids_interactors_SPECIES_ID_only)[2] = "two_hybrid"
unique_complementation_interactors_SPECIES_ID_only = unique(cbind(complementation_all_interactors_SPECIES_ID_only[c("interactor_IDs")], 1))
colnames(unique_complementation_interactors_SPECIES_ID_only)[2] = "all_protein_complementation"
unique_ap_ms_interactors_SPECIES_ID_only = unique(cbind(ap_ms_all_interactors_SPECIES_ID_only[c("interactor_IDs")], 1))
colnames(unique_ap_ms_interactors_SPECIES_ID_only)[2] = "AP_MS"

loading logic table made by “swissprot_vs_imex_protein_properties” script

filename_vs_2 = paste("./analysis/","proteome_vs_interactome_protein_properties_f_", n,"_reviewed_",r,"_isoforms_",i,"_", date,".txt", sep = "")
proteome_vs_imex_details_f = as.data.frame(read.delim(filename_vs_2, header = T, stringsAsFactors = F,quote=""))
proteome_vs_imex_details_f$whole_proteome_Uniprot_IMEx = factor(proteome_vs_imex_details_f$whole_proteome_Uniprot_IMEx, ordered =F)

merging new results with the logic table

proteome_vs_imex_interaction_details_t1 = merge(proteome_vs_imex_details_f, 
                                   unique_twohybrids_interactors_SPECIES_ID_only, 
                                   by.x = "whole_proteome_IDs",
                                   by.y = "interactor_IDs",
                                   all.x = T, all.y = F)
proteome_vs_imex_interaction_details_t2 = merge(proteome_vs_imex_interaction_details_t1, 
                                               unique_complementation_interactors_SPECIES_ID_only, 
                                               by.x = "whole_proteome_IDs",
                                               by.y = "interactor_IDs",
                                               all.x = T, all.y = F)
proteome_vs_imex_interaction_details_f = merge(proteome_vs_imex_interaction_details_t2, 
                                               unique_ap_ms_interactors_SPECIES_ID_only, 
                                                by.x = "whole_proteome_IDs",
                                                by.y = "interactor_IDs",
                                                all.x = T, all.y = F)
proteome_vs_imex_interaction_details_f[is.na(proteome_vs_imex_interaction_details_f)] = 0

adding factor combination variable - two_hybrid and AP_MS

proteome_vs_imex_interaction_details_f[,length(proteome_vs_imex_interaction_details_f)+1] = interaction(proteome_vs_imex_interaction_details_f$two_hybrid, proteome_vs_imex_interaction_details_f$AP_MS, sep = "_")
colnames(proteome_vs_imex_interaction_details_f)[length(proteome_vs_imex_interaction_details_f)] = paste0("two_hybrid", "_vs_","AP_MS")
levels(proteome_vs_imex_interaction_details_f$two_hybrid_vs_AP_MS) = c("not_two_hybrid_and_not_AP_MS", "two_hybrid_not_AP_MS","not_two_hybrid_but_AP_MS", "two_hybrid_and_AP_MS")
proteome_vs_imex_interaction_details_f$two_hybrid_vs_AP_MS = as.character(proteome_vs_imex_interaction_details_f$two_hybrid_vs_AP_MS)
proteome_vs_imex_interaction_details_f$two_hybrid_vs_AP_MS[proteome_vs_imex_interaction_details_f$IMEx!=1] = "not_in_IMEx"

saving combined logic table with protein properties from Uniprot and interaction properties combined

filename = paste("./analysis/","proteome_vs_interactome_interaction_properties_f_", n,"_reviewed_",r,"_isoforms_",i,"_", date,".txt", sep = "")
write.table(proteome_vs_imex_interaction_details_f,filename,col.names=T,row.names=F,sep="\t",quote=F)

Results

=========================================================

read the table saved before - script can be started from here

filename = paste("./analysis/","proteome_vs_interactome_interaction_properties_f_", n,"_reviewed_",r,"_isoforms_",i,"_", date,".txt", sep = "")
proteome_vs_imex_interaction_details_f = as.data.frame(read.delim(filename, header = T, stringsAsFactors = F,quote=""))
proteome_vs_imex_interaction_details_f$whole_proteome_Uniprot_IMEx = factor(proteome_vs_imex_interaction_details_f$whole_proteome_Uniprot_IMEx, ordered =F)
proteome_vs_imex_interaction_details_f$two_hybrid_vs_AP_MS = factor(proteome_vs_imex_interaction_details_f$two_hybrid_vs_AP_MS, ordered =F)

The density and the histogram of the protein mass (overlay for different detection methods)

=========================================================

library(ggplot2)
ggplot(proteome_vs_imex_interaction_details_f, aes(x = Mass, color =two_hybrid_vs_AP_MS, 
                                                   fill = two_hybrid_vs_AP_MS)) +
  geom_density(alpha =0.2) + scale_x_log10() +
  xlab("protein mass, Da, log10 scale")+
  ggtitle("density plot of the protein mass (overlay for different detection methods)") #+facet_grid(two_hybrid_vs_AP_MS~Organism)

ggplot(proteome_vs_imex_interaction_details_f, aes(x = Mass,color = two_hybrid_vs_AP_MS)) +
  scale_x_log10() + geom_histogram(position = "identity", bins = 50,alpha =0.1) +
  xlab("protein mass, Da, log10 scale") + 
  ggtitle("histogram of the protein mass (overlay for different detection methods)")

#+facet_grid(two_hybrid_vs_AP_MS~Organism)

How does protein interaction detection method depend on the protein length? - Violin plot

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(scales) 
## adding two_hybrid and AP_MS levels to show non-combinatory relationship (not 
## excluding proteins present in both)
two_hybrid = filter(proteome_vs_imex_interaction_details_f, two_hybrid == 1)
two_hybrid$two_hybrid_vs_AP_MS = "two_hybrid"
AP_MS = filter(proteome_vs_imex_interaction_details_f, AP_MS == 1)
AP_MS$two_hybrid_vs_AP_MS = "AP_MS"
for_plot = rbind(proteome_vs_imex_interaction_details_f, two_hybrid, AP_MS)
for_plot$two_hybrid_vs_AP_MS = factor(for_plot$two_hybrid_vs_AP_MS, ordered = F)
## Calculating protein median protein mass for each group
yy = split(for_plot, for_plot$two_hybrid_vs_AP_MS)
Mass_median = sapply(yy, function(x){median(log10(x$Mass))})
kDa_median_mass = (10^Mass_median)/1000
plot_labels = paste0(gsub("_"," ",levels(for_plot$two_hybrid_vs_AP_MS)),", \n (median mass: ", signif(kDa_median_mass,3)," kDa)")
## Violin plot - proportional
ggplot(for_plot, aes(y = Mass, x =two_hybrid_vs_AP_MS, fill = two_hybrid_vs_AP_MS)) +
  geom_violin(draw_quantiles = c(0.05,0.25,0.495,0.5,0.505,0.75,0.95),scale = "count", alpha =0.7) +
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                labels = trans_format("log10", math_format(10^.x))) +
  scale_x_discrete(labels = plot_labels)+
  geom_abline(slope = 0, intercept = Mass_median, alpha =0.1) +
  ylab("Mass, Da, log10 scale")+
  xlab("presence in IMEx, detection method")+
  ggtitle("How does protein interaction detection method depend on the protein length?",
          subtitle = "violin areas are scaled proportionally to the number of observations")+
  coord_flip()#+facet_grid(two_hybrid_vs_AP_MS~Organism)

## Violin plot - not proportional
ggplot(for_plot, aes(y = Mass, x =two_hybrid_vs_AP_MS, fill = two_hybrid_vs_AP_MS)) +
  geom_violin(draw_quantiles = c(0.05,0.25,0.495,0.5,0.505,0.75,0.95),scale = "area", alpha =0.7) +
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                labels = trans_format("log10", math_format(10^.x))) +
  scale_x_discrete(labels = plot_labels)+
  geom_abline(slope = 0, intercept = Mass_median, alpha =0.1) +
  ylab("Mass, Da, log10 scale")+
  xlab("presence in IMEx, detection method")+
  ggtitle("How does protein interaction detection method depend on the protein length?",
          subtitle = "all violins have the same area")+
  coord_flip()#+facet_grid(two_hybrid_vs_AP_MS~Organism)

print(signif(data.frame("mass_median(log10.Da)" = Mass_median, "mass_median(kDa)" = (10^Mass_median)/1000),3))
##                              mass_median.log10.Da. mass_median.kDa.
## not_in_IMEx                                   4.56             36.7
## not_two_hybrid_and_not_AP_MS                  4.70             50.0
## not_two_hybrid_but_AP_MS                      4.76             57.3
## two_hybrid_and_AP_MS                          4.75             56.2
## two_hybrid_not_AP_MS                          4.63             42.6
## two_hybrid                                    4.70             50.0
## AP_MS                                         4.75             56.6

Let’s filter all the proteins composing 36.7kDa peak in non-represented-in-IMEx group

proteome_vs_imex_interaction_details_f.not_in_IMEx = dplyr::filter(
  proteome_vs_imex_interaction_details_f, two_hybrid_vs_AP_MS == "not_in_IMEx")
proteome_vs_imex_interaction_details_f.32.42 = dplyr::filter(
  proteome_vs_imex_interaction_details_f.not_in_IMEx, 
  32000 < proteome_vs_imex_interaction_details_f.not_in_IMEx$Mass)
proteome_vs_imex_interaction_details_f.32.42 = dplyr::filter(
  proteome_vs_imex_interaction_details_f.32.42, 
  proteome_vs_imex_interaction_details_f.32.42$Mass < 42000)
#proteome_vs_imex_interaction_details_f.32.42.biogrid = dplyr::filter(
#  proteome_vs_imex_interaction_details_f.32.42, 
#  proteome_vs_imex_interaction_details_f.32.42$BioGRID_from_Mentha == 0)
#proteome_vs_imex_interaction_details_f.32.42.biogrid.annotation5.5 = dplyr::filter(
#  proteome_vs_imex_interaction_details_f.32.42.biogrid, 
#  proteome_vs_imex_interaction_details_f.32.42.biogrid$Annotation == "5 out of 5")
write(proteome_vs_imex_interaction_details_f.32.42$whole_proteome_IDs, "proteome_vs_imex_interaction_details_f.32.42.txt")
#table(proteome_vs_imex_interaction_details_f.not_in_IMEx)
#swissprot = filter(proteome_vs_imex_interaction_details_f,
#       proteome_vs_imex_interaction_details_f$whole_proteome_Uniprot == 1)
#write(swissprot$whole_proteome_IDs, "swissprot.txt")

Quick GO enrichment in Cytoscape ClueGO showed that large chunk of the missing human interactome consists of odor receptors (36kDa spike), also bitter taste receptors ~20 genes, membrane transporters.

GO terms decribing missing proteins

cell.process = EBImage::readImage('./results/proteome_vs_imex_interaction_details_f.not_in_IMEx.cell.process.png') 
EBImage::display(cell.process)

detailed GO enrichment results available here - “./results/proteome_vs_imex_details_f.not_in_IMEx.ClueGO/”

Removed olfactory receptors

## removing olfactory receptors:
      proteome_vs_imex_details_f_minus_odor = proteome_vs_imex_interaction_details_f[-grep("Odor", proteome_vs_imex_interaction_details_f$Protein.names),]
      proteome_vs_imex_details_f_minus_odor_olf = proteome_vs_imex_details_f_minus_odor[-grep("Olfactory", proteome_vs_imex_details_f_minus_odor$Protein.names),]
      print(paste("number of proteins containing \"Odor\" or \"Olfactory\" in protein name: ", length(grep("Odor", proteome_vs_imex_interaction_details_f$Protein.names)) +
  length(grep("Olfa", proteome_vs_imex_interaction_details_f$Protein.names))))
## [1] "number of proteins containing \"Odor\" or \"Olfactory\" in protein name:  438"
      ## density without olfactory receptors:
      #ggplot(proteome_vs_imex_details_f_minus_odor_olf, aes(x = Mass, color = whole_proteome_Uniprot_IMEx, alpha =0.5)) +geom_density()+ scale_x_log10()
      two_hybrid = dplyr::filter(proteome_vs_imex_details_f_minus_odor_olf, two_hybrid == 1)
two_hybrid$two_hybrid_vs_AP_MS = "two_hybrid"
AP_MS = dplyr::filter(proteome_vs_imex_details_f_minus_odor_olf, AP_MS == 1)
AP_MS$two_hybrid_vs_AP_MS = "AP_MS"
for_plot = rbind(proteome_vs_imex_details_f_minus_odor_olf, two_hybrid, AP_MS)
for_plot$two_hybrid_vs_AP_MS = factor(for_plot$two_hybrid_vs_AP_MS, ordered = F)
## Calculating protein median protein mass for each group
yy = split(for_plot, for_plot$two_hybrid_vs_AP_MS)
Mass_median = sapply(yy, function(x){median(log10(x$Mass))})
kDa_median_mass = (10^Mass_median)/1000
plot_labels = paste0(gsub("_"," ",levels(for_plot$two_hybrid_vs_AP_MS)),", \n (median mass: ", signif(kDa_median_mass,3)," kDa)")

## Violin plot - proportional
ggplot(for_plot, aes(y = Mass, x =two_hybrid_vs_AP_MS, fill = two_hybrid_vs_AP_MS)) +
  geom_violin(draw_quantiles = c(0.05,0.25,0.495,0.5,0.505,0.75,0.95),scale = "count", alpha =0.7) +
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                labels = trans_format("log10", math_format(10^.x))) +
  scale_x_discrete(labels = plot_labels)+
  geom_abline(slope = 0, intercept = Mass_median, alpha =0.1) +
  ylab("Mass, Da, log10 scale")+
  xlab("presence in IMEx, detection method")+
  ggtitle("How does protein interaction detection method depend on the protein length?",
          subtitle = "all violins have the same area")+
  coord_flip()#+facet_grid(two_hybrid_vs_AP_MS~Organism)

## Violin plot - not proportional
ggplot(for_plot, aes(y = Mass, x =two_hybrid_vs_AP_MS, fill = two_hybrid_vs_AP_MS)) +
  geom_violin(draw_quantiles = c(0.05,0.25,0.495,0.5,0.505,0.75,0.95),scale = "area", alpha =0.7) +
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                labels = trans_format("log10", math_format(10^.x))) +
  scale_x_discrete(labels = plot_labels)+
  geom_abline(slope = 0, intercept = Mass_median, alpha =0.1) +
  ylab("Mass, Da, log10 scale")+
  xlab("presence in IMEx, detection method")+
  ggtitle("How does protein interaction detection method depend on the protein length?",
          subtitle = "all violins have the same area")+
  coord_flip()#+facet_grid(two_hybrid_vs_AP_MS~Organism)

# creating a list of missing proteins for ClueGO
proteome_vs_imex_details_f_minus_odor_olf.not_in_IMEx = dplyr::filter(
  proteome_vs_imex_details_f_minus_odor_olf, two_hybrid_vs_AP_MS == "not_in_IMEx")
write(proteome_vs_imex_details_f_minus_odor_olf.not_in_IMEx$whole_proteome_IDs, "proteome_vs_imex_interaction_details_f.not_in_IMEx_olfac_removed.txt")

GO terms decribing missing proteins (after removing olfactory receptors)

cell.process.olf.removed = EBImage::readImage('./results/proteome_vs_imex_interaction_details_f.not_in_IMEx.ofl_removed.cell.process.png') 
EBImage::display(cell.process.olf.removed)

detailed GO enrichment results available here - “./results/proteome_vs_imex_details_f_minus_odor_olf.not_in_IMEx_ClueGO/”

#image0 = proteome_vs_imex_interaction_details_f[,c(2:3,6:18,29:31)]
#image = as.data.frame(lapply(image0, function(x){as.numeric(x)}))
#rafalib::mypar()
#rafalib::imagemat(image)
#colnames(image)
image1 = proteome_vs_imex_interaction_details_f[,c(1:3,6:18,30:32)]
image1.melt = reshape2::melt(image1)
## Using whole_proteome_IDs as id variables
colnames(image1.melt) = c("whole_proteome_IDs", "group", "presence")
ggplot(image1.melt, aes(group, whole_proteome_IDs)) + geom_tile(aes(fill = presence)) +  scale_fill_gradient(low = "white", high = "black") + theme(axis.text.x = element_text(size = 8, angle = 10, margin = margin(t=8)))